Skip to content

Conversation

@nalimilan
Copy link
Member

Add a type argument to quantile to support the three remaining types that we didn't support. Some of these are useful in particular because they correspond to actual values from the data and work for types that do not support arithmetic.

Fixes #185.

@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.84%. Comparing base (0ce3149) to head (2b2595a).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #187      +/-   ##
==========================================
+ Coverage   96.66%   96.84%   +0.17%     
==========================================
  Files           2        2              
  Lines         450      475      +25     
==========================================
+ Hits          435      460      +25     
  Misses         15       15              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +1079 to +1107
if type == 1
return v[clamp(ceil(Int, n*p), 1, n)]
elseif type == 2
i = clamp(ceil(Int, n*p), 1, n)
j = clamp(floor(Int, n*p + 1), 1, n)
return middle(v[i], v[j])
elseif type == 3
return v[clamp(round(Int, n*p), 1, n)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used simplified formulas specific to each case, as I find code resulting from using the single general formula from the Hyndman & Fan paper very hard to grasp without any advantage. I hope I didn't introduce mistakes, especially in corner cases. Please suggest things to test if you can find some that are not covered.

@test quantile(v, 1.0, alpha=1.0, beta=1.0) 21.0

# tests against R's quantile with type=1
@test quantile(v, 0.0, type=1) === 2
Copy link
Member Author

@nalimilan nalimilan Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see the type of the result can be different for types 1 and 3 as we keep the original type instead of using whatever type arithmetic operations produce. This makes sense and is even necessary if we want to work for types that don't support arithmetic.

But this means the inferred return type when passing type is a Union of two types. Maybe OK as it's small enough to be optimized out? If not there are probably ways to ensure inference works via combined use of Val(type) and @inline in a wrapper function.

EDIT: The situation is slightly worse for quantile([1, 2], [0.1, 0.2]) as the inferred type is Union{Vector{Float64}, Vector{Int64}, Vector{Real}}.

(Note that when omitting type, the inferred type is concrete as before so at least there's no regression.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add tests for p=0 and p=1 (i.e. integer p), or even p=false and p=true (I undesrand we allow it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't constant propagation be able to handle this? I.e. make quantile_type1(v, p) = quantile(v, q, type=1) inferred?

m = alpha + p * (one(alpha) - alpha - beta)
# Using fma here avoids some rounding errors when aleph is an integer
# The use of oftype supresses the promotion caused by alpha and beta
aleph = fma(n, p, oftype(p, m))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will error if p=0 or p=1 and m is a fraction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Though this PR doesn't touch this code, let's handle it separately?

Reproducer:

quantile(1:3, 0, alpha=0.2, beta=0.0)

# Using fma here avoids some rounding errors when aleph is an integer
# The use of oftype supresses the promotion caused by alpha and beta
aleph = fma(n, p, oftype(p, m))
j = clamp(trunc(Int, aleph), 1, n - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this clamp correct if p=1? It seems to me that then j should be n and later we just need to make sure not to use j+1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, because γ is 1 in that case and we return (1-γ)*v[j] + γ*v[j + 1] (so we don't care about v[j]).

Add a `type` argument to `quantile` to support the three remaining
(non-interpolating) types that we didn't support. Some of these are
useful in particular because they correspond to actual values from
the data and work for types that do not support arithmetic.
@nalimilan
Copy link
Member Author

Gentle bump. :-)

Copy link
Member

@andreasnoack andreasnoack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just some minor comments/questions

@test quantile(v, 1.0, alpha=1.0, beta=1.0) 21.0

# tests against R's quantile with type=1
@test quantile(v, 0.0, type=1) === 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't constant propagation be able to handle this? I.e. make quantile_type1(v, p) = quantile(v, q, type=1) inferred?

Comment on lines +1090 to +1091
alpha = (0.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
beta = (1.0, 1/2, 0.0, 1.0, 1/3, 3/8)[type-3]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is no different from the current implementation but are we certain that these values should be Float64? E.g. if all arguments were Float32 and you wanted to avoid intermediate promotion. For elements with higher precision, I suppose this would also mean that there'd be a loss of precision because 1/3 is not representable as a binary float. Since it only affects type=8, and therefore a fairly negligible corner case, we can maybe just open an issue to have the loss on record instead of dealing with it here.

- Def. 7: `alpha=1`, `beta=1` (Julia, R and NumPy default, Excel `PERCENTILE` and `PERCENTILE.INC`, Python `'inclusive'`)
- Def. 8: `alpha=1/3`, `beta=1/3`
- Def. 9: `alpha=3/8`, `beta=3/8`
The keyword argument `type` can be used to choose among the 9 definitions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider calling the keyword definition to follow the naming of the paper. I guess type is what the R functions use.

- Def. 9: `alpha=3/8`, `beta=3/8`
The keyword argument `type` can be used to choose among the 9 definitions
in Hyndman and Fan (1996). Alternatively, `alpha` and `beta` allow reproducing
any of the methods 4-9 defined in this paper. It is not allowed to specify both
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
any of the methods 4-9 defined in this paper. It is not allowed to specify both
any of the definitions 4-9 of this paper. It is not allowed to specify both

just to avoid also introducing "method" as another synonym on top of "definition" and "type".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for non-interpolating quantile computation

5 participants